refactor: framework de-coupling — generic runtime, ASR as use case (v1.1)#2
Merged
Conversation
Replace hardcoded INC-YYYYMMDD-NNN session-id format with config-driven prefix. Apps declare their prefix via FrameworkAppConfig.session_id_prefix (default "SES", validated 1-16 chars, alphanumeric + hyphens). The prefix threads through SessionStore -> Session.id_format -> repository. Configs: - incident_management.yaml: INC - code_review.runtime.yaml: REVIEW - config.yaml (default): SES (also explicitly INC for asr-default) Validation rejects empty / whitespace / symbols / underscore / >16-chars at config-load time via field_validator. First step toward v1.1 framework de-coupling — runtime no longer hardcodes the incident-management session-id convention. Tests: 869 passed, 3 skipped (16 new parametrized tests in tests/test_session_id_prefix.py covering default, custom prefixes INC/REVIEW/HR/MY-APP/16-char, and invalid inputs). dist/* bundles regenerated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New module src/runtime/terminal_tools.py introduces the type system
for the generic terminal-tool registry that replaces the hardcoded
_TERMINAL_TOOL_RULES table in orchestrator.py and the
_TYPED_TERMINAL_TOOLS frozenset in graph.py (consumed in Wave 2).
- TerminalToolRule: tool_name + status + extract_fields dict, all
extra="forbid" to reject typos in app YAML at boot (D-06-01).
- extract_fields generalises the v1.0 _extract_team(team_keys)
positional-tuple lookup into a {dest: [args.X|result.X]} dict so
apps name fields whatever fits (team, severity, reviewer) (D-06-02).
- StatusDef: name + terminal + kind (Literal of 5 categories), no
color or label (UI presentation owned by UIConfig.badges per
D-06-05 rejected alternative).
The new module is vocabulary-free: zero references to
mark_resolved/mark_escalated/notify_oncall/etc, satisfying
DECOUPLE-02 for the framework side.
Add three new fields and a cross-field model_validator to
OrchestratorConfig so apps can declare their terminal-tool rules
and status vocabulary in YAML without writing Python (D-06-03).
Fields:
- terminal_tools: list[TerminalToolRule] (D-06-01)
- statuses: dict[str, StatusDef] (D-06-05)
- default_terminal_status: str | None (D-06-06)
Cross-field invariants enforced at config-load (raises
pydantic.ValidationError):
- statuses non-empty => default_terminal_status required
- default_terminal_status must reference a declared status
- referenced default status must have terminal=True
- every terminal_tools[i].status must reference a declared status
- empty statuses with non-empty terminal_tools or set
default_terminal_status is rejected (config mistake)
Bare OrchestratorConfig() still constructs with empty registry —
the framework default for unconfigured apps stays valid. Also
tightens model_config = {"extra": "forbid"} on OrchestratorConfig
so unknown YAML keys fail loudly at boot, matching the
extra="forbid" pattern used by sibling configs in this file.
Wave 2 (06-02) will consume these fields in
_finalize_session_status / _harvest_typed_terminal and ship the
same-PR incident_management.yaml registration (D-06-04).
tests/test_status_vocabulary.py exercises the new
OrchestratorConfig.{terminal_tools,statuses,default_terminal_status}
plus the TerminalToolRule / StatusDef pydantic models. Covers
DECOUPLE-03 acceptance: framework rejects unknown statuses and
unknown defaults at config-load, not at gateway-eval time.
Test coverage:
- TerminalToolRule shape: minimal + extract_fields round-trip +
extra="forbid" rejects unknown keys (D-06-01, D-06-02)
- StatusDef shape: minimal + Literal kind enforcement +
extra="forbid" rejects color (D-06-05 — UI presentation leak)
- OrchestratorConfig happy path with mixed rules and statuses
- bare OrchestratorConfig() still constructs (framework default
invariant)
- default_terminal_status without statuses rejected
- terminal_tools without statuses rejected
- default required when statuses non-empty
- default must reference a declared status (error names valid keys)
- default must be terminal=True (in_progress fails)
- terminal_tools[i].status must reference a declared status (error
names the index and the offending value, both for index 0 and
later indices)
- extra="forbid" on OrchestratorConfig survives the new fields
All 16 tests pass; full suite at 885 passed (was 869).
Two new test files; both fail until Tasks 2.2/2.3/2.4 land: - tests/test_concept_leak_ratchet.py — DECOUPLE-06 binary ratchet walking src/runtime/ for the 6 forbidden tokens (mark_resolved, mark_escalated, notify_oncall, submit_hypothesis, update_incident, apply_fix). Currently fails because _TERMINAL_TOOL_RULES still lives in orchestrator.py and _TYPED_TERMINAL_TOOLS in graph.py. - tests/test_terminal_tool_registry.py — 9 finalize-path integration tests covering both incident_management and a synthetic code_review-style registration. Currently fails because Orchestrator._infer_terminal_decision / _extract_field are not yet instance methods (D-06-08). RED phase commit per the v1.1 milestone TDD discipline. GREEN follows in the atomic framework-migration commit.
…rized escalate (DECOUPLE-02, DECOUPLE-03, DECOUPLE-06) Migrates `_finalize_session_status` and the typed-terminal harvester to read app-registered rules off `OrchestratorConfig` instead of the v1.0 hardcoded `_TERMINAL_TOOL_RULES` / `_TYPED_TERMINAL_TOOLS` tables. Ships the framework refactor and the matching incident_management YAML registration in a single atomic commit per D-06-04. Decisions cited: - D-06-02: extract_fields lookup syntax preserved (args.X / result.X). - D-06-03: registry lives in app YAML; pydantic-validated at boot. - D-06-04: same-PR atomic commit — incident_management end-to-end flow stays green throughout. - D-06-05: StatusDef.kind drives the escalated_to mirror dispatch rather than hardcoded vocabulary. - D-06-06: default_terminal_status is app-owned (incident_management uses needs_review; code_review uses unreviewed). - D-06-08: `_finalize_session_status` reads `self.cfg.orchestrator` directly via instance method shape. Resolutions applied: - Resolution A (escalate path) — Option B: parameterised on the new `OrchestratorConfig.escalate_action_tool_name` and `escalate_action_default_team` fields. The orchestrator's `resume_session(action="escalate")` looks up the matching rule in `terminal_tools`, drives status assignment from `rule.status`, and pulls extra-field destination key from `rule.extract_fields`. Hardcoded `notify_oncall` / `platform-oncall` / `escalated_to` literals removed from `orchestrator.py`. - Resolution B (ratchet scope) — Option 3: `RATCHET_ALLOWLIST` carries 8 entries with phase-handles (`Phase 7` for `mcp_servers/remediation.py`; `Phase 8` for `terminal_tools.py` docstrings + `storage/event_log.py` docstrings). Companion meta-test asserts each entry still matches a real line — when Phase 7 / 8 ship and the offending file is gone, the meta-test fails and forces stale-entry deletion. Allowlist is shrink-only. New OrchestratorConfig fields: - `patch_tools: list[str]` — tools whose `args.patch` blob the harvester folds into agent confidence/signal/rationale. Avoids a third hardcoded `update_incident` leak by routing the recognition through YAML. - `harvest_terminal_tools: list[str]` — tools the harvester treats as typed-terminal for confidence capture but the orchestrator's finalize path does NOT transition status on. Used for `submit_hypothesis`-style stage-complete signals. - `escalate_action_tool_name: str | None` — MCP tool the orchestrator invokes when a user clicks Escalate. None means no side-effect. - `escalate_action_default_team: str | None` — fallback team when the user did not pick one. Same-PR YAML registration shipped in: - config/incident_management.yaml - config/config.yaml - config/config.yaml.example - config/code_review.runtime.yaml (placeholder, full migration in Phase 8 per D-06-04) Test fixtures updated to provide `cfg` to the finalize-only stub orchestrators (test_finalize_status_inference.py, test_session_lock.py, test_finalize_concurrent.py) and to pass `terminal_tool_names` / `patch_tool_names` to `_harvest_tool_calls_and_patches` / `make_agent_node` (test_harvester_typed.py, test_agent_node.py, test_gate.py). The `cfg` fixture in test_resume.py now mirrors the incident_management YAML registration so the parameterised escalate path flows through unchanged behaviour for legacy callers. Verification: - pytest: 899 passed (vs 870 baseline pre-Phase 6). - ratchet `git grep` — every match is allowlisted and explained. - ruff: clean on every touched file. - All three production YAMLs (config.yaml, config.yaml.example, code_review.runtime.yaml) validate against the new OrchestratorConfig schema. dist/ bundle regen deliberately deferred to Plan 06-03.
… close-out
Phase 6 close-out (Generic Terminal-Tool Registry + Status Vocabulary).
Final wave: regenerated dist/app.py, dist/apps/incident-management.py,
dist/apps/code-review.py from the post-Phase-6 src/runtime/ + examples/
tree. dist/ui.py regenerated to byte-identical content (config-driven UI
shell already generic; no v1.1 surface changes). The bundles now embed
TerminalToolRule + StatusDef pydantic models, drop the v1.0
_TERMINAL_TOOL_RULES table and _TYPED_TERMINAL_TOOLS frozenset, and
carry per-rule extract_fields generalising the v1.0 _extract_team
lookup.
Phase 6 final acceptance:
- git grep leak-ratchet: 9 matches, all allowlisted (2 Phase 7,
7 Phase 8) per RATCHET_ALLOWLIST in test_concept_leak_ratchet.py
- pytest -q: 899 passed (Phase 5 baseline 873; +28 from Phase 6:
+14 status_vocabulary, +9 terminal_tool_registry, +5 ratchet)
- bundle smoke tests: 12 passed
- ruff src/runtime/ tests/ examples/: clean
Phase 6 totals: 3 plans, 11 tasks, 6 commits across 3 waves
(8a249f3, 700b4e1, d7e8b26, 1b43017, 070e15d, this commit).
Closes DECOUPLE-02, DECOUPLE-03, DECOUPLE-06.
STATE.md and 06-03-SUMMARY.md document the full hand-off to Phase 7
(DECOUPLE-04) + Phase 8 (DECOUPLE-05 + 07) — both files are gitignored
per project policy and NOT included in this commit.
Ready for /gsd-transition to Phase 7.
…covery (DECOUPLE-04)
Phase 7 / DECOUPLE-04 — single atomic commit per D-07-04. Framework no
longer hardcodes incident-vocabulary MCP servers; apps declare their
per-tool servers via a generic configurable dotted-path list.
* D-07-01 — moved (git mv, history preserved):
src/runtime/mcp_servers/{observability,remediation,user_context,__init__}.py
-> examples/incident_management/mcp_servers/
* D-07-02 — new field OrchestratorConfig.mcp_servers: list[str]
(src/runtime/config.py). YAML wiring:
config/config.yaml + config/incident_management.yaml + config.yaml.example
declare orchestrator.mcp_servers with the 3 dotted paths under
examples.incident_management.mcp_servers.*
config/code_review.runtime.yaml declares orchestrator.mcp_servers: []
— empty-list branch verified clean (dist/apps/code-review.py
contains zero examples.incident_management.mcp_servers refs).
* D-07-03 — converted setters to register(mcp_app, cfg) contract:
observability.set_environments -> register(mcp_app, cfg) closing over
cfg.environments via _make_environment_validator (snapshotted; no
module-level mutable list).
remediation.set_escalation_teams -> register(mcp_app, cfg) closing
over cfg.framework.escalation_teams (or cfg.escalation_teams
fallback) into a module-level snapshot tuple.
user_context -> no-op register(mcp_app, cfg) for contract uniformity.
Chosen contract: two-arg register(mcp_app, cfg). The orchestrator
passes mcp_app=None — modules expose their own per-module FastMCP
instance composed by the loader; mcp_app exists for future
composition needs.
* orchestrator.py:357-365 (now ~365-380): replaced the silent-swallow
try/except setter calls with a single importlib-driven loop:
for module_path in cfg.orchestrator.mcp_servers:
mod = importlib.import_module(module_path)
reg = getattr(mod, "register", None)
if reg is None:
raise RuntimeError(
f"orchestrator.mcp_servers entry {module_path!r} does "
f"not expose a `register(mcp_app, cfg)` callable"
)
reg(None, cfg)
No more silent except — missing register raises explicitly per the
must_haves contract.
* Bundler patched: scripts/build_single_file.py
RUNTIME_MODULE_ORDER:66-68 (the three hardcoded mcp_servers paths)
removed from the framework-only block; the equivalent entries added
to INCIDENT_APP_MODULE_ORDER (under EXAMPLES_ROOT) so the per-tool
modules ship in dist/apps/incident-management.py without leaking
into dist/app.py (framework-only) or dist/apps/code-review.py.
* Bundles regenerated: dist/app.py, dist/ui.py,
dist/apps/incident-management.py, dist/apps/code-review.py.
* Import-site flip: ~60 sites across 19 test files + 4 config YAMLs
changed `runtime.mcp_servers.*` -> `examples.incident_management.mcp_servers.*`.
`git grep -E 'runtime\\.mcp_servers|runtime/mcp_servers' -- src/ tests/
apps/ scripts/ examples/ config/` returns zero hits.
* Ratchet allowlist (tests/test_concept_leak_ratchet.py): removed the
two `src/runtime/mcp_servers/remediation.py` entries (`apply_fix`,
`notify_oncall`) and the explanatory Phase-7 comment. The
`test_allowlist_entries_actually_match` meta-test stays GREEN
because the file no longer exists under that path — leaving the
entries would have failed the meta-test (stale allowlist).
* Test contract migration: tests/test_notify_oncall_team_required.py
was importing the now-removed `set_escalation_teams` directly. It
now binds the roster via the public `register(mcp_app, cfg)`
contract with a SimpleNamespace cfg — same observable behavior,
same three test cases. tests/test_resume.py cfg fixture: added
`mcp_servers=[...]` to OrchestratorConfig so the orchestrator's new
discovery loop binds the escalation roster (otherwise the snapshot
would carry over from prior tests, a fixture-completeness bug
exposed by the contract change — Rule 3).
Verification:
* git ls-files src/runtime/mcp_servers/ -> empty
* git ls-files examples/incident_management/mcp_servers/
-> 4 files (__init__, observability, remediation, user_context)
* full pytest suite: 899 passed (baseline preserved)
* ruff check (touched files): All checks passed
* dist/apps/incident-management.py: contains the 3 relocated MCP
server bodies (verified via grep for FastMCP("observability") etc.)
* dist/apps/code-review.py: contains zero
examples.incident_management.mcp_servers references — D-07-02
empty-list branch GREEN.
Closes DECOUPLE-04 from REQUIREMENTS.md.
…t (DECOUPLE-05, DECOUPLE-07) — v1.1 milestone close
Closes milestone v1.1 (Framework De-coupling). Single atomic commit per
D-08-04 covering DECOUPLE-05 + DECOUPLE-07 + ratchet binarization.
DECOUPLE-05 (D-08-01, D-08-02):
* OrchestratorConfig.state_overrides_schema: str | None — dotted path
to a pydantic BaseModel subclass (`mod.path:Class` or `mod.path.Class`).
* importlib resolution at Orchestrator.create(); bad path raises
RuntimeError with the offending path AND class name. Non-BaseModel
targets are caught explicitly.
* start_session(state_overrides=…) runs cls.model_validate(...) after
_coerce_state_overrides and before store.create. None skips
validation entirely — D-08-02 backward compatibility.
* New IncidentStateOverrides (environment, severity) and
CodeReviewStateOverrides (pr_url, repo, base_branch, pr_number).
Both extra='forbid' so typos surface at session-start, not at
gateway-eval (PVC-06 generalization).
* All three runtime YAMLs declare state_overrides_schema.
DECOUPLE-07 (D-08-03):
* TerminalToolRule.match_args: dict[str, str] = {} — NEW optional
argument-value discriminator. Empty default preserves v1.0 single-
rule shape; non-empty restricts the rule to tool calls whose
args[k] == v for every declared key. Generic feature any app can
use (priority, severity, recommendation, …).
* code_review.runtime.yaml: 5-status vocab + 3 set_recommendation
rules dispatching by args.recommendation -> approved /
changes_requested / commented; default_terminal_status: unreviewed.
* tests/test_code_review_e2e.py (6 tests): real Orchestrator from
YAML + synthesized set_recommendation ToolCall + finalize hook.
Asserts code_review-vocabulary status (NOT unreviewed/needs_review),
state_overrides validation rejects unknown keys + cross-app shapes,
no incident_management modules in sys.modules after the test.
Concept-leak ratchet binary closure:
* Scrubbed mark_resolved/mark_escalated/notify_oncall/apply_fix from
src/runtime/terminal_tools.py:19-20,45-47 and
src/runtime/storage/event_log.py:3,37. Replacement uses neutral
<terminal_tool> / set_recommendation placeholders.
* RATCHET_ALLOWLIST = {} (was 5 entries).
* git grep '\b(mark_(resolved|escalated)|notify_oncall|submit_hypothesis|update_incident|apply_fix)\b' src/runtime/ — zero matches.
Bundler:
* INCIDENT_APP_MODULE_ORDER and CODE_REVIEW_APP_MODULE_ORDER list
their app's state.py first. dist/* regenerated; both schema
classes embedded; ast.parse() OK.
Tests: 927 passed (was 899) — 22 new schema + 6 new e2e. ruff clean
on all touched files. RATCHET_ALLOWLIST: {}.
Closes DECOUPLE-05 and DECOUPLE-07 — 7/7 v1.1 DECOUPLE-* shipped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…_with_schema CI was missing OLLAMA_API_KEY env var, causing _interpolate to KeyError when loading config/config.yaml. Tests should not depend on real secrets — set a placeholder via monkeypatch so the YAML parses without making real Ollama calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ds_with_schema
Prior fix only set OLLAMA_API_KEY but config.yaml has 5 `${VAR}` refs:
OLLAMA_API_KEY, AZURE_ENDPOINT, AZURE_OPENAI_KEY, EXTERNAL_MCP_URL, EXT_TOKEN.
Set placeholders for all of them so _interpolate succeeds in CI without any
real secrets.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Summary
v1.0 (PR #1) made workflow correctness structural — typed tools, schema-validated boundaries, per-session locks. But the runtime itself still hardcoded ASR-incident-management vocabulary:
_TERMINAL_TOOL_RULESknewmark_resolved/mark_escalated/notify_oncallby name,runtime/mcp_servers/observability.pylived in the framework dir,Session.id_formatreturnedINC-*. Smoking gun:examples/code_review/was structurally broken — every code_review session would land atneeds_reviewbecause the framework didn't recognize its terminal tools.This PR makes the runtime structurally framework-generic. ASR is now a use case, not the framework.
What changed (4 phases, 12 commits, +3.4k LOC, 927 tests passing)
FrameworkAppConfig.session_id_prefix(1-16 chars, validated). Apps configure:INC/REVIEW/SES. 16 boundary tests.src/runtime/OrchestratorConfig.terminal_tools+statuses+default_terminal_status. Genericextract_fieldsfor team/severity/etc. metadata capture. Parameterized escalate path (escalate_action_tool_name+escalate_action_default_team). Concept-leak ratchet test enforces zero incident-vocabulary in framework.runtime/mcp_servers/{observability,remediation,user_context}.py→examples/incident_management/mcp_servers/. Framework discovers viaOrchestratorConfig.mcp_servers: list[str]dotted paths. Apps own their config-binding contracts (register(mcp_app, cfg)). 80 import sites flipped.OrchestratorConfig.state_overrides_schema(dotted-path pydantic class).IncidentStateOverrides+CodeReviewStateOverridesboundary tests.tests/test_code_review_e2e.pywalks an actual code_review session and asserts terminal status from code_review's vocabulary, notneeds_review. GenericTerminalToolRule.match_argsdiscriminator added to support code_review'sset_recommendation(recommendation=approve|request_changes|comment)single-tool dispatch. Binary ratchet —RATCHET_ALLOWLISTis now empty.Tests
git grep -E 'mark_(resolved|escalated)|notify_oncall|submit_hypothesis|update_incident|apply_fix' src/runtime/returns zero matchesmatch_args;sys.modulessnapshot diff confirms zeroincident_management*imports leaked into code_review sessionWhat the framework no longer knows
INC-*)mark_resolved,mark_escalated,notify_oncall,submit_hypothesis,update_incident,apply_fix)escalated,resolved,needs_review— all app-declared now)extract_fieldsOrchestratorConfig.mcp_serversThese are now structural guarantees enforced by the binary leak ratchet.
Migration impact
terminal_tools,statuses,default_terminal_status,escalate_action_tool_name,mcp_servers,state_overrides_schema. Existing skills, MCP server, and runtime behavior unchanged. End-to-end test suite passes.terminal_tools(3 rules dispatchingset_recommendationviamatch_args),statuses(approved/changes_requested/commented/unreviewed),state_overrides_schema. e2e test verifies the genericity invariant.Known follow-ups (deferred to v1.2 Hardening)
ollama.comfallback URL, ApprovalWatchdog leak, singleton thread-safety, dist staleness CI gate, ui.py zero-coverage. All tracked in.planning/REQUIREMENTS.mdv2 section.Test plan
🤖 Generated with Claude Code